Spock - a Spoken Corpus Client
نویسندگان
چکیده
Spock is an open source tool for the easy deployment of time-aligned corpora. It is fully web-based, and has very limited server-side requirements. It allows the end-user to search the corpus in a text-driven manner, obtaining both the transcription and the corresponding sound fragment in the result page. Spock has an administration environment to help manage the sound files and their respective transcription files, and also provides statistical data about the files at hand. Spock uses a proprietary file format for storing the alignment data but the integrated admin environment allows you to import files from a number of common file formats. Spock is not intended as a transcriber program: it is not meant as an alternative to programs such as ELAN, Wavesurfer, or Transcriber, but rather to make corpora created with these tools easily available on line. For the end user, Spock provides a very easy way of accessing spoken corpora, without the need of installing any special software, which might make time-aligned corpora corpora accessible to a large group of users who might otherwise never look at them.
منابع مشابه
Annotations for Dynamic Diagnosis of the Dialog State
This paper describes recent work aimed at relating multi-level dialog annotations with meta-data annotations for a corpus of real humanhuman dialogs. This work is carried out in the context of the AMITIES project in which spoken dialog systems for call center services are being developed. A corpus of 100 agent-client dialogs have been annotated with three types of annotations. The first are utt...
متن کاملEl-WOZ: a client-server wizard-of-oz interface
In this paper, we present a speech recording interface developed in the context of a project on automatic speech recognition for elderly native speakers of European Portuguese. In order to collect spontaneous speech in a situation of interaction with a machine, this interface was designed as a Wizard-of-Oz (WOZ) plateform. In this setup, users interact with a fake automated dialog system contro...
متن کاملVague Language and Interpersonal Communication: An Analysis of Adolescent Intercultural Conversation
This paper is concerned with the analysis of the spoken language of teenagers, taken from a newly developed specialised corpus the British and Taiwanese Teenage Intercultural Communication Corpus (BATTICC). More specifically, the study employs a discourse analytical approach to examine vague language in an intercultural context among a group of British and Taiwanese adolescents, paying particul...
متن کاملThe Effect of CMC in Business Emails in Lingua Franca: Discourse Features and Misunderstandings
The paper argues that everyday exchange of business emails produces a development in the work-group relationship, which, in turn, makes new communication styles possible and acceptable by the users' habit to computer-mediated forms, even in unbalanced professional exchanges. The focus is on the (spoken) discourse features of email messages in a self-compiled corpus of selected computer-mediated...
متن کاملThe Assessment of Pragmatic Knowledge in the Online General IELTS-Practice Resources: A Corpus Analysis of Writing Tasks
Motivated by the concept of Communicative Language Ability and the eminence of the IELTS exam, this study intended to scrutinize the representation of functional knowledge (FK) and socio-linguistic knowledge (SK) as sub-components of pragmatic knowledge in the writing performances of both tasks of the online General IELTS-practice resources across three band scores. This quantitative inter-scor...
متن کامل